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Abstract 

The purpose of this study was to assess dimensionality of attitudinal data arising from 
unfolding models for discrete data, and to compute rough estimates of item and individual 
parameters for use as starting values in other estimation programs. One- and two-dimensional 
simulated test data were analyzed in this study. Results of limited analyses performed so far have 
shown that linear principal components analysis of unfolding data provides a reliable estimate of 
the underlying dimensionality. For every unfolding dimension, there are 2 linear principal 
components. In addition, pattern coefficients of items on the two principal components 
associated with each dimension form a fan-shaped simplex pattern resembling a semicircle. Arc 
length of an item along the semicircle can serve as an estimate of item location. Length of an 
item in the item space spanned by the two linear components can serve as an estimate of item 
discrimination. For individuals, arc lengths computed using the individual scores on the two 
principal components could serve as estimates of individuals' location parameters. For two 
dimensional test data, an algorithm has been developed to identify component pairs associated 
with each dimension and to correctly classify items into dimensional groups. 
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Attitudinal Data: Dimensionality and Start Values for Estimating Item Parameters 

Increasingly, attitudinal data are being used along with achievement data to assess 
educational outcomes. For example, it is important to measure teacher and pupil attitude or 
opinion about mathematics when assessing the effectiveness of mathematics curriculum in 
elementary grades. This is typically accomplished by having respondents indicate the degree to 
which they agree (or disagree) to a series of attitude statements. 

In the framework of unfolding theory it is presumed that a subject endorses an item 
(statement) if the content of the item matches the subject's opinion. Psychometrically speaking, 
we would expect more endorsement to the extent that a subject is located close to an item on a 
latent attitudinal space. Let denote the item location on the latent continuum and 9j denote 
the subject's location on the same latent continuum. Then, the degree of agreement with the 
statement increases as \9 j -8 i | approaches zero. And the degree of disagreement with the item 

increases as this difference 1 9j -5 i | increases, resulting in a non-monotone, bell shaped, single- 

peaked response function for the given item. 

Several stochastic models have been recently developed to model attitudinal data arising 
from an unfolding model. Parametric item response models include the squared simple logistic 
model (Andrich, 1988), the PARELLA model (Hoijtink, 1990, 1991), hyperbolic cosine model 
(Andrich & Luo, 1993; Luo, 2001), the graded unfolding model (Roberts & Laughlin, 1996), and 
the generalized graded unfolding model (Roberts, Donoghue, & Laughlin, 2000). Nonparametric 
item response models also have been proposed for data resulting from an ideal point response 
process (Cliff, Collins, Zatkin, Gallipeat, & McCormick, 1988; van Schuur, 1984). 

Although there are several models available to analyze attitude data resulting from an 
ideal point process, there are no well-established procedures to assess model data fit, which is 
fundamental for proper interpretation of resulting estimates. If a unidimensional model is applied 
to analyze data, for example, it is important to establish a unidimensional trait underlying the 
data before applying the model to estimate an individual's scale value. Even though many 
methodologies have been developed for dimensionality assessment of achievement item data, 
they cannot be readily applied to attitude item data because models for attitudinal data implement 
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single peaked, non-monotone response functions, while models for achievement data follow 
cumulative monotone response functions. 

Davison (1977) showed that when item responses follow a simple metric unfolding 
model, the principal components of the inter-item correlation matrix will suggest two primary 
components, and the loadings of the components form a simplex pattern. Davison used a simple 
unidimensional model to demonstrate his results, namely, the squared distance model, where 
item responses are linearly related to continuous squared distances between item and person 
scores. A paper by van Schuur and Kiers (1994) has provided a comprehensive summary of 
relationship between unfolding models and linear factor analysis. They showed that when 
unidimensional data arising from an unfolding model are analyzed using traditional linear factor 
analysis, one obtains two independent factors instead of one bipolar factor. This is because, in 
linear factor analysis, observed variables are linearly related to the underlying latent variables or 
dimensions. Whereas in unfoldable data, observed variables have a non-linear relationship to 
underlying dimensions, and also the relationship is non-monotonic. Ross and Cliff (1964) 
mathematically proved that if the underlying dimensionality of unfolding data is r, then principal 
components analysis of such data will give a factor solution of dimension r+1, using a squared 
distance model. Maraun and Rossi (2001) have demonstrated that for unfoldable data, the 
unidimensional model is equivalent to the unidimensional quadratic factor model, and that items 
conforming to r-dimensional space will have 2r-dimensional representation using linear factor 
analyses. 

All these studies have used simple, continuous distance models for simulating unfolding 
data, and for the most part, have employed only principal components analysis with unities in the 
diagonal for dimensionality assessment. More importantly, these studies were highly limited in 
their empirical investigation of dimensional structure of data. 

The objectives of the present study were two-fold: (1) to conduct a systematic empirical 
investigation of dimensional structure of unfolding data using linear factor analysis when data 
are modeled via a discrete unfolding IRT model; (2) to establish start values of item and 
individual parameters for use in estimation programs for one and two-dimensional data. 

This study is not yet fully completed. We did a mini dimensionality study to get a sense 
of the effect of different variables and types of factor analysis solutions to assess dimensionality 
of unfolding data. Since the results were promising, we did another mini study on parameter 




5 



5 



estimation for one-dimensional unfolding data. We have just begun analysis of two-dimensional 
data. A full scale simulation study encompassing both dimensionality and parameter estimation 
.. will be undertaken shortly. 

The Dimensionality Study 

Given the limited empirical investigation of different approaches to determine the 
dimensional structure of unfolding data, the first study investigated the relationship between the 
true dimensionality of unfolding data and the number of linear factors generated for one- and 
two-dimensional tests. 

The unidimensional simulation results presented in this study are based on the 
generalized graded unfolding model (GGUM) due to Roberts, Donoghue, and Laughlin (2000) as 
given by: 
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where Z, is the observable response to attitude statement i, taking values 0,1,2,. .., C. A response 
value of C denotes strongest level of agreement and 0 denotes strongest level of disagreement, 
and M= 2C+1. Given a subject's attitude position on the latent continuum (9j), Equation 1 
describes the probability that the response for person j to item i falls into a particular category as 
a function of the signed distance (Oj - 5 i ) between the person's attitude position 0 y and the 

location of the item <5,. . a,, is the discrimination parameter of the item, and r ik {\a = 0,1,2,.. .C) are 
the threshold parameters which are symmetric about the point (9 } -S,) = 0. 



Individual attitude positions ( 9 ■ ) for unidimensional tests were generated from 
the standard normal distribution; Item discrimination parameters (ar ( . ) were generated from a 
uniform distribution with parameters between .5 and 2. Item locations (£,.) were generated from 
a uniform distribution between -2 and 2. The threshold parameter r iC was generated from a 
uniform distribution with parameters between -1.4 and -.4., and the successive thresholds were 
generated recursively as: 
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r ik _ , = T ik - .25 + e ik _ { , for k = 2,3,..., C , where e ik _ x ~ iV(0,04) 



The two-dimensional results are based on a two-dimensional GGUM model, an extension 
of Equation 1 as given by: 
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Individual attitude positions ( 0 xj and # 27 - ) for two-dimensional tests were generated from 

a bivariate normal distribution with appropriate correlation for two dimensions. Two levels of 
correlation between attitudinal dimensions were considered: 0 and 0.5. Location and threshold 
parameters were independently generated for the two dimensions as explained before. In 
generating item discrimination parameters a,, and a 2i , two constants and £ 2 were used to 
indicate the influence of each dimension on the item. For example, if = £ 2 = 0.5, then both 
attitudes have equal weight on the item. On the other hand, if = .75 and £ 2 = .25 , then the first 
attitude influences the item more heavily than the second attitude. Item discrimination 
parameters were generated as follows. Initially, or, and a 2 were each generated from a uniform 
distribution between 0 and 1 . Then they were appropriately linearly transformed using and £ 2 , 

so that final values of a ] and a 2 , each range between .5 and 2. 

Two-dimensional test items were of two types: (1) simple structure, where each item had 
nonzero loading on only one attitude continuum; and (2) mixed structure, where some test items 
had non-zero loadings on both attitude continuums (complex items). 

A Six-point scale was used ("strongly disagree", "disagree", "slightly disagree", " slightly 
agree", "agree", and "strongly agree") to score individual responses. For one-dimensional tests, 
the probability of each categorical response for an individual on an item was computed using the 
GGUM model given by Equation 1. For two-dimensional tests, the probability of each response 
category for an item was computed using Equation 2. These probabilities were used to divide a 
probability interval into six mutually exclusive and exhaustive segments, where each segment 
corresponds to a particular observable response category. A random number from a uniform 
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distribution was then generated and the probability segment containing the generated number 
was the response category for the individual. This process is referred to as the observed data. 

In order to investigate the effect of random error in the data on dimensionality 
assessment, both error (observed) and error free data were considered. Error free data was 
generated using the expected values instead of probabilities to generate individual responses. 

This was done by taking the expected value of responses for each individual as follows: 

E,= 0 * P( 0) + 1 * P(l) + 2 * P(2) + 3 * P( 3) + 4 * P(4) + 5 * P( 5) , 
where P(x) denotes the probability of obtaining a score x for a given attitude level obtained 
using Equation 1 . 

Both principal components analysis and factor analysis solutions were investigated for 
determining dimensionality of given data. The sample size was fixed to 2000 subjects. Two test 
lengths were used: 20 and 80 items. 

Linear factor analysis was performed on observed data and on expected values (i.e., 
error- free data). Principal components analysis and principal axis factor analysis were each 
performed. In each case, the underlying dimensionality and the number of factors was 
determined based on the eigenvalues of a bootstrapped parallel analysis as described below (Buja 
& Eyuboglu, 1992). 

Bootstrapped Parallel analyses — A random data set was generated with the same number of 
subjects and items as the given data set. This was done by independently sampling responses to 
each item with replacement from the given data set. Principal components analysis and factor 
analysis were performed and eigenvalues were averaged over the repeated bootstrap samples. 
Average eigenvalues of random data over 10 repeated samples were plotted against the 
eigenvalues of observed (or expected) data. Dimensionality was determined as the number of 
eigenvalues above the intersection point of these two plots. 

For given test data, the underlying latent dimensionality was determined based on four 
procedures: principal components analysis of observed data (PCOB), principal components of 
expected values (PCEX), factor analysis of observed data (FAOB), and factor analysis of 
expected values (FAEX). In all these procedures, the bootstrapped parallel analysis criterion was 
used to determine the number of underlying dimensions. 
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Results of the dimensionality study 

Results of dimensionality analyses of unidimensional and two-dimensional data are 
displayed in Table 1 . The first and second columns denote true dimensionality and the test size. 
The third column denotes the number of items loading on each dimension. For example, '35-35- 
10' denotes that 35 items load on attitude 1(0,), 35 items load on attitude 2 (0 2 ), and 10 items 
both attitudes 9 X and 9 2 . The fourth column denotes weight given to each attitude. For example, 
'0.75, 0.25' denotes that 9 X has a weight of .75 while 9 2 has a lower weight of .25. In other 
words, attitude 1 influences the item to a higher degree than attitude 2. The fifth column denotes 
the correlation between the attitude dimensions. The next four columns denote the estimated 
dimensionality of given test data for all four procedures (PCEX, PCOB, FAEX, FAOB) based on 
the bootstrapped parallel analysis criterion to determine dimensionality. 

From Table 1 it can be seen that, in all cases, there is an over inflation of the number of 
underlying dimensions determined using factor analysis and principal components. However, 
there is a systematic and consistent pattern of over inflation of dimensionality for principal 
components analysis (PCOB and PCEX) but not so for factor analysis (FAOB and FAEX). 
Specifically, principal components analysis results generally showed that there are 2 linear 
components for each unfolding dimensionality. This is true also for tests containing mixed items - 
and correlated attitudes. In addition, the number of estimated dimensions is the same for both 
observed and expected (error-free) data. Hence, the error in the data has not played a substantial 
role in determining the underlying dimensionality. However, for factor analysis, the number of 
estimated dimensions is unpredictable. The last column shows eigenvalues of principal 
components analysis of observed data for a randomly chosen trial. It can be seen that in one- 
dimensional cases the first large eigenvalue corresponds to the unfolding dimension; and in two- 
dimensional cases, the first two eigenvalues (much larger than the other two) correspond to the 
two unfolding dimensions. 

Figures 1 through 7 show plots of factor loadings for principal components analysis of 
observed data. Figure 1 shows the plot for one-dimensional test with 20 items. It can be seen that 
pattern coefficients of the two linear components formed a clear simplex pattern. That is, plot of 
loadings formed a fan-like pattern resembling a semicircle. Figures 2 to 7 show plots for a two- 
dimensional test with 40 items. For a two-dimensional test there are four linear components 
resulting in six plots. Two of these plots, Figures 4 and 5, exhibited simplex pattern, 
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corresponding to the two latent unfolding dimensions. These figures indicate that components 1 
and 4 go with dimension 1 and components 2 and 3 go with dimension 2. Examining such plots 
across all tests it was found that the simplex pattern between the associated components was 
unambiguously clear when the data exhibited a simple structure. However, as the simple 
structure was contaminated through adding of mixed items and/or the correlation between 
abilities, the simplexes had more scatter and resembled a unidimensional simplex pattern as the 
correlation approached .5. 

In two-dimensional test data, plots of pattern coefficients were extremely useful in 
identifying the pairs of components associated with each unfolding factor. For example, 
sometimes components 1 and 4 were associated with the first dimension and components 2 and 3 
with the second dimension. Other times components 1 and 3 were associated with the first 
dimension and components 2 and 4 with the second. By observing the plots one can identify the 
component pairs associated with each dimension, and items associated with the correct 
component pair. 

Based on the results of this study, it is clear that principal components analysis of 
observed data with bootstrapped parallel analysis criterion to determine the number of 
underlying factors provided reliable evidence about the underlying dimensionality. There are 2 
linear components associated with each unfolding dimension. Plots together with eigenvalues 
determine the dimensionality of data. Plots of pattern coefficients help to identify the factor pairs 
and structure in two-dimensional tests. 

It appears that simplex pattern of pattern coefficients could be used to determine a rough 
scale of items on the underlying latent continuum. These values could serve as rational starting 
values for item parameters of the underlying model when iterative parameter estimation 
methods are used. 

The Study to Determine Start Values for Estimation 
Unidimensional Case (d=1) 

In the case of unidimensional unfolding data, it has been shown that linear principal 
components analysis of such data give rise to two significant linear components and that the 
pattern coefficients of items corresponding to those components form a simplex pattern. That is, 
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pattern coefficients formed a fan-like pattern resembling a semicircle. It is proposed that items 
along the semicircle may be ordered to approximately determine their position on the latent 
continuum. To do this, the arc length is computed from a fixed point on the coordinate axis for 
each item, and these lengths are used as estimates of item locations on the latent continuum (i.e., 
8 values). These steps are listed in the Algorithm 1 explained below. 

Similarly, respondents can also be ordered based on their scores on the two principal 
components forming the factor space. The arc length of each respondent is computed from a 
fixed point on the coordinate axis and respondents are ordered according to the arc length to 
determine their approximate location on the latent continuum. The algorithm for computing the 
arc length for items (or individuals) is described below. Respondents' positions computed in this 
manner can serve as estimates of their attitudes ( 6j ). Item discrimination parameter ( a, ) can also be 
estimated from the length of the item vector projected on the two corresponding components. 

Algorithm 1 : An algorithm to determine item (S i , a, ) and respondent {6 . ) parameters on 
the latent continuum. 

For one-dimensional tests, all calculations are based on the first two principal 
components associated with the correlation matrix among the items. For items, (x, y) values are 
pattern coefficients, and for respondents they are component scores. Call the (x,y) space 
associated with these pattern coefficients, the item space, and call the (x, y) space defined by the 
component scores, the respondent space. 

The estimate of the item discrimination parameter ( a i ) is the length of the item vector in 
the item space. For 8 . and 6 . , the idea is to calculate arc lengths, in the item space for S j and in 
the respondent space for 6 . . Plots corresponding to the item space and the respondent space can' 

be arranged so the points range mostly in the first (x > 0, y > 0) and fourth (x > 0, y < 0) 
quadrants (as in Figure 1). 

Step 1: Fix a point on the unit circle. We use (1,0). 

Step 2: Compute the projection length, r as 
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Step 3: Compute the arc length 1 of each item or subject from the fixed point (1,0). 
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where y is the pattern coefficient (item) or component score (respondent) associated with the 
largest eigenvalue, and x is the pattern coefficient or component score associated with the second 
largest eigenvalue. The value of sign(y) is 1 if y > 0 and -1 otherwise. The unit is radians. 

These calculations give positive arc length for positive y and negative arc length for negative y. 



In summary, let length stand for the item length, r, in the item space; arc-delta indicate 
the arc length in the item space; and arc-theta stand for the arc length in the respondent space. 
Then, length estimates starting values for a t , arc-delta estimates starting values for S i , and arc- 
theta estimates starting values for 9 . 



The simulation study 

In order to investigate the effectiveness of proposed methods to estimate parameters of 
unidimensional unfolding model given in Equation 1, three test lengths were considered: 10, 20, 
and 40. Since generally in attitudinal surveys fewer items are used, small test size of 10 items 
was introduced in this study. Respondent's attitude, discrimination, and threshold parameters for 
simulated data were generated in a manner described in the dimensionality study. However, the 
delta parameters were generated from three different distributions to reflect realistic situations: 
item locations uniformly distributed over a wide range of the continuum (i.e., from -2 to +2); 
item locations more frequently distributed in the positive regions of the continuum (i.e., from -1 
to +2); and item locations distributed only in the positive regions of the latent continuum (i.e., 
from 0 to +2). Individual responses for a given test length and delta distribution were generated 
based on Equation 1 in exactly the same manner explained for the dimensionality study. 

For a given data set, three estimates arc-theta, arc-delta, and length were computed and 

for each parameter, correlations were computed between true {8 i , a i , and 9j ) and estimated 



1 There are various formulas to compute the arc length that are mathematically equivalent. 
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values (arc-delta, length, arc-theta) for each condition of delta distribution. Moreover, for each 
parameter, regression analysis of true values of the parameter on estimated values was performed 
to see if the true scale of the parameter could be recovered. This process was replicated 100 
times and the descriptive statistics of these estimates over 100 replications was computed. 

Results are reported in Tables 2, 3, and 4. 

Results of the simulation study for d=l 

Table 2 shows results for the location estimate arc-delta. Table values are the averages 
over 100 replications and are grouped according to the three distributions of item locations. As 
can be seen, the product moment correlation coefficients between true ( 8 ) and estimated values 
(arc-delta) are extremely high for all test sizes and for all three distributions, indicating very 
strong association between the true and estimated values. However, as evidenced by the results 
of the regression analyses, the metric of estimated and the true values are not always the same. 
The slope and the intercept, although close to desirable values of one and zero for the wide 
distribution of deltas (-2 to 2), deviate from what is expected for the narrow distributions of 
delta. Furthermore, the standard deviations of intercepts are large in all cases. 

Table 3 shows results for the respondent estimates (arc-theta). The correlations between 
9 and arc-theta are moderately high for all test sizes and in all three distributions. These 
correlations, as expected, are much larger for the widely spread distribution of location 
parameters (-2 to 2) than for other distributions of the location parameter. As in the case of arc- 
delta, the regression results of 9 on arc-theta show that the estimates are not on the same metric 
as the original ( 9 ) parameter. This is indicated by average slopes that are much less than unity. 
Additionally, the average intercept was greater than zero for narrow distribution of deltas. 

Table 4 shows results for estimates of the discrimination parameter, a . As seen 
previously for 8 and 0 , the correlations between true and estimated values are moderate for all 
test sizes and all delta distributions. Again, as seen before, the metric of the estimated values is 
not the same as the true metric as evidenced by regression analyses results. 

In summary, all three estimates, arc-theta, length, and arc-delta, have moderately high to 
extremely high correlations with their respective true parameters. However, they are not on the 
same metric as the original parameters. Several procedures were conducted in an attempt to 
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recover the original metric of the parameters, but none of these procedures produced acceptable 
results. Therefore, these efforts are not reported here. 

Two-dimensional Case (d=2) 

As described in the dimensionality study, when there are two well-defined unfoldable 
dimensions, and the test is constructed to represent the entire continuum, principal components 
analysis of such data yields four dominant linear components. By observing the plots of the 
pattern coefficients of the four components one can determine the pairs of components, 
associated with each dimension. In realistic situations neither dimensionality nor the items 
associated with each dimension is known. Hence after determining dimensionality underlying the 
data, it is necessary to correctly identify component pairs associated with each dimension, and 
items that form each dimension, before estimating item and respondent parameters. 

In the two-dimensional unfolding case, there are four eigenvalues associated with the 
four linear dimensions. However, for unfolding data, two eigenvalues and the associated pattern 
coefficients define each unfoldable dimension. The first two largest eigenvalues generally 
correspond to the two unfoldable dimensions underlying data. Each of the remaining two 
eigenvalues is also associated with one of the two unfoldable dimensions. Similarly each of the 
first two sets of pattern coefficients is associated with one of the sets from the remaining 
components that defines each dimension, as shown through the simplex design of the pattern 
coefficients. An example may illustrate this point. Figures 2 to 7 show six plots of factor pairs 
associated with four linear dimensions of a 40-item two-dimensional test. From observing these 
plots it is clear that linear components 1 and 4 are associated with one unfolding dimension - 
(Figure 4), and components 2 and 3 are associated with the other unfolding dimension (Figure 5). 
Plots in Figures 2, 3, 6, and 7 show mismatched components. 

In terms of matching component pairs, since the largest components 1 and 2 are each 
associated with a different unfoldable dimension, they cannot be paired with each other. 

Similarly, components 3 and 4 cannot be paired together. Therefore possibilities for correct 
matching are only component pairs: 1 and 3, 1 and 4, 2 and 3, or 2 and 4. Two out of these four 
are correct matches. Hence for a given test, the task is to identify component-pairs and items 
associated with each dimension. 
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Several methods were tried to identify correct pairs of components and to correctly match 
items to pairs. The following algorithm seems most promising. At present this algorithm has 
been applied only to test items resembling a simple structure scenario. That is, each item is 
determined by only one attitude, either 6 , or 0 2 . 

Algorithm 2 : Algorithm to match items to correct component pairs based on "the weighted 
correspondent angle." 

Here the idea is to first identify the pair of components that correspond to each 
simplex and then assign items to the appropriate simplex. An index called "the weighted 
correspondent angle" is computed for each item based on its length and the angle it makes with 
the nearest reference axis. The standard deviation of the weighted correspondent angles is 
generally largest for the pair of components that define a simplex pattern than for other 
component pairs. Once component pairs associated with simplexes are identified, then items are 
assigned to simplexes based on their length. The following steps are used to identify correct 
component pairs and match items to the correct simplex. 



Stepl . For each component pair, consider items only in the first and the fourth quadrants where 
the simplex pattern is situated. 

Step2 . Compute the angle between each included point and its closest reference axis. Multiply 
the length of the item vector with the angle it makes with the closest axis. Call this "the 
weighted correspondent angle." 

Step3 . Compute the standard deviation of weighted correspondent angles over all items. 

Step4 . Compare the standard deviations of correspondent angles for all logical component pairs: 
(1,3), (1,4), (2,3), and (2,4). The component pair associated with the largest standard 
deviation of correspondent angles provides the correct combination of components 
associated with one of the unfolding dimensions. The component pair for the second 
dimension is its compliment pair. For example, if the component pair (1,3) is associated 
with one dimension, then the component pair (2,4) is associated with the other dimension. 

Step5 . Assign items to one of the two selected component pairs defined (i.e., simplexes) in step 
4: find the length of the item vector in each simplex; assign the item to the simplex where 
the length is largest. 



The simulation study d=2 
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A simulation study was undertaken to investigate the effectiveness of the proposed 
algorithm to identify the component pairs defining each simplex and to correctly match items in 
each simplex. Individual item responses were generated in exactly the same manner as 
explained in the dimensionality study for the two-dimensional data. The test length was fixed to 
40 items and respondent size was fixed to 1000 subjects. Three distributions of item locations 
were considered as before: item locations uniformly distributed over the entire range of the 
continuum from -2 to +2; item locations more frequently distributed in the positive regions from 
-1 to +2; and item locations distributed only in the positive regions of the latent continuum from 
0 to +2. Algorithm 2 was applied for the data set in each distribution category, and results are 
reported in Table 5. 

Results of the simulation study for d=2 

Table 5 shows results for identifying correct component pairs (i.e., simplexes) and 
matching items to the correct simplex. The cell values are the percentages out of 100 
replications, where all 40 items were correctly matched to dimensional groups after the two fan- 
like simplex patterns were identified. It can be seen that the percentages are very high in all 
cases, indicating the usefulness of Algorithm 2. 

Observing replications where all items were not correctly classified into correct groups, it 
was found that in most cases the number of misclassified items were few. However, there were 
few replications (1 to 3 out of 100), especially when the deltas ranged between -1 and 2, where 
about half the number of items were misclassified. In realistic situations, one hopes that the 
content of the items and prior experience can aid in classifying such items into right simplex. 

Summary and Discussion 

Results reported in this study constitute ongoing research in dimensionality assessment, 
and development of start values for parameter estimation of test data, generated from attitudinal 
items that follow the generalized graded unfolding model. One- and two-dimensional simulated 
test data were analyzed in this study. Results so far are very encouraging. Principal components 
analysis provides a reliable estimate of the underlying dimensionality of unfolding data, namely, 
2 linear dimensions are generated for every unfoldable dimension. In the case of unidimensional 
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unfolding data, parameter estimates arc-theta, arc-delta, and length, have high correlations with 
true parameters, indicating their usefulness as start values. It is, however, desirable to recover the 
true metric of these parameters, or at least a common metric for all parameter estimates. 

For two-dimensional test data, several methods were attempted to identify the correct pair 
of components defining each unfoldable dimension; and to classify items correctly to these 
dimensions. Only Algorithm 2 has shown promise, and the results of Table 5 are very 
encouraging. This algorithm needs to be further validated on a broad variety of tests resembling 
realistic two-dimensional situations with varied items in each dimension and correlations 
between attitudinal dimensions. 

This study is highly limited in many ways. These results cannot be generalized until a 
full scale simulation study is completed. Future studies will focus on also estimating the item 
threshold parameters in the generalized graded unfolding model in addition to the location and 
discrimination parameters. Another important focus of future studies would be to recover the 
metric of the original parameters (or at least a common metric among parameters) and 
computing initial estimates for two-dimensional tests. 

With the recent emergence of a large class of IRT models for unfolding data, it is 
important to have methods to determine the fit of the models to data before applying these 
models. Given the sparse literature in dimensionality assessment for unfolding data, it is critical 
to perform a through investigation of dimensionality assessment in this area. 
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Table 2: Summary of Results for arc-delta* 


test size 




-2 < delta < 2 

10 20 40 


-1 < delta < 2 

10 20 40 


0 < delta < 2 

10 20 40 


r(delta, arc-delta) 
R-squared 




0.98 0.99 0.99 

0.97 0.98 0.98 


0.98 0.99 0.99 

0.96 0.96 0.97 


0.95 0.97 0.96 

0.91 0.93 0.93 


Regression of 
delta on 
arc-delta 


intercept 
slope 
sd (int) 
sd(slope) 


0.05 0.00 0.02 

0.95 0.94 0.95 

0.31 0.21 0.15 

0.10 0.04 0.03 


0.00 0.00 0.00 

0.90 0.89 0.90 

0.48 0.37 0.34 

0.10 0.08 0.03 


0.04 -0.38 -0.50 

0.95 1.00 1.00 

0.98 0.56 0.13 

0.10 0.06 0.05 



*Except where identified by “sd,” entries in all tables are averages over 100 replications. 



Entries marked “sd” are standard deviations over 100 replications. 



Table 3: Summary of Results for arc-theta* 






-2 < delta < 2 


-1 < delta < 2 


0 < delta < 2 


test size 




10 


20 


40 


10 


20 


40 


10 


20 


40 


r(theta, arc-theta) 




0.84 


0.92 


0.95 


0.73 


0.85 


0.88 


0.71 


0.79 


0.82 


R-squared 




0.72 


0.85 


0.90 


0.60 


0.75 


0.81 


0.53 


0.63 


0.68 




intercept 


0.01 


0.00 


0.01 


0.00 


0.01 


0.00 


0.10 


0.13 


-0.13 


regression or 
theta on 


slope 


0.55 


0.62 


0.65 


0.48 


0.57 


0.59 


0.46 


0.52 


0.55 




sd (int) 


0.11 


0.11 


0.09 


0.11 


0.13 


0.17 


0.05 


0.04 


0.04 


arc-theta 


sd(slope) 


0.11 


0.04 


0.02 


0.16 


0.12 


0.11 


0.11 


0.08 


0.05 



Table 4: Summary of Results for length 


test size 




-2 < delta < 2 

10 20 40 


-1 < delta < 2 

10 20 40 


0 < delta < 2 

10 20 40 


r(delta, length) 
R-squared 




0.83 0.87 0.85 

0.70 0.75 0.73 


0.84 0.84 0.84 

0.72 0.71 0.72 


0.79 0.84 0.85 

0.64 0.71 0.72 


Regression of 
alpha on 
length 


intercept 
slope 
sd (int) 
sd(slope) 


-3.29 -2.33 -1.87 

5.69 4.69 4.16 

1.22 0.47 0.36 

1.50 0.61 0.47 


-3.22 -2.09 -1.67 

5.67 4.42 3.94 

1.12 0.55 0.32 

1.38 0.71 0.42 


-3.09 2.22 -1.85 

5.33 4.52 4.12 

1.53 0.57 0.33 

1.90 0.71 0.42 



Table 5: Percentage of Correct Classification of Items to Dimensions 


test size 


-2 < delta < 2 

40 


-1 < delta < 2 

40 


0 < delta < 2 

40 


Algorithm3 


94 


88 


91 
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C\E 
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Tablel: Results of Dimensionality Analysis of Unfolding Data 




eigenvalues(PCOB)*** 


31.10, 14.61, 1.06, 1.04, .77, .76 


7.94, 4.34, .71, .65, .57, .53 




17.38, 14.94, 6.68, 6.49, .95, .86 


8.75, 8.26, 3.58, 3.21, .80, .74 


5.40, 4.7,2.07, 1.26, .74, .70 




20.32, 9.8, 3.48, 2.60, .98, .96 


23.38, 14.10, 1.81, 1.42, .97, .94 




15.62, 13.40, 8.10, 6.06, 1.25, .96 


19.31, 11.39, 8.63, 6.39, .93, .85 




23.82, 10.64, 2.38, 1.66, 1.00, .96 






28.08, 14.65, 1.17, 1.11, .95, .91 


* si and a2 are the weights given to the two dimesnions. si =0.75 and s2=0.25, denotes that thetal was given higher 
weight than theta2. 
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Figure 1. Plot of pat l*pat2$lab . Symbol used is 
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Figure 2. Plot of patl*pat2$lab . Symbol used is '*'. 
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Figure 3. Plot of pat l*pat3$lab . Symbol used is 
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Figure 4. Plot of patl*pat4$lab . Symbol used is ' *' 
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Figure 5. Plot of pat2 *pat 3$lab . Symbol used is 
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Figure 6. Plot of pat2*pat4$lab . Symbol used is 
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Figure 7. Plot of pat3*pat4$lab . Symbol used is 
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